-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPC delay metrics #2293
RPC delay metrics #2293
Conversation
@scylladb/seastar-maint , please review |
Can you write a bit about the motivation of this specific metric? Is it something you have seen in some other API and found useful (if so, a reference would be nice)? What do you want to use this new feature for? What bothers me is that the subtraction you're doing - latency measured by client minus latency measured by server - isn't exactly the RTT of anything: It isn't the RTT of the full request (because that would include the handling time, not subtracting it), and it's not the "classic" network RTT because it includes includes queueing time on both client and server. |
When looking for sources of network latency in scylladb/scylladb#9505 it was good to know how much time it takes for RPC frame to pass through seastar(send)-kernel(send)-network-kernel(recv)-seastar(recv)-and_back (for the record, one of the problematic places found was in seastar(send) -- #1067), so the idea to collect it as metrics popped up.
Why not?
If I measure timestamps of point 1 and 13, their difference is the total time it took request to process. If I measure the time step 7 took, and subtract it from that calculated difference, I get exactly the time it took the request to travel through seastars and kernels on both sides and the network in between.
But it includes the queuing time on both -- client and server, unless I mis-read the code and placed the timestamps not in points 1 and 13 as described above |
@@ -487,7 +490,7 @@ private: | |||
private: | |||
future<> negotiate_protocol(feature_map map); | |||
void negotiate(feature_map server_features); | |||
future<std::tuple<int64_t, std::optional<rcv_buf>>> | |||
future<std::tuple<int64_t, std::optional<uint32_t>, std::optional<rcv_buf>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to document what the optional means.
Note: microseconds can overflow 32 bits for an rpc call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ~1 hour. Are there handlers that last more?
@@ -430,6 +430,7 @@ class client : public rpc::connection, public weakly_referencable<client> { | |||
struct reply_handler_base { | |||
timer<rpc_clock_type> t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the client has two durations using which it can estimate the
rount-trip time of a packet -- the duration between sending request and
receiving response and the time it too handler to execute on the server
side. Difference is the RTT value.
What about time waiting in the send queue (usually none) or the receive queue? Or waiting for the rpc semaphore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new metrics include all of these.
Or does your question really mean "lets adding those metrics too"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that the difference is not RTT, it's RTT + waits in queues we don't (and can't) measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the definition of RTT I get used to
Round-trip time (RTT) is the duration it takes for a network request to go from a starting point to a destination and back again to the starting point
and "starting point" here is whoever calls send_helper::operator()
. It doesn't tell if there are or there are no queues/delays/throttlers/whatever on the way of the request.
I'm fine to rename it to be clear, but how do you define RTT then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTT is the network time without any queues (which ping measures if it doesn't have a bad day).
Do you subtract service time from the measured round trip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTT is the network time without any queues (which ping measures if it doesn't have a bad day).
The "without any queues" is pretty optimistic requirement, I would say. I admit that outgoing is likely to take queue-less paths even in the kernel, but incoming all consists of queues.
Do you subtract service time from the measured round trip?
Depends on what "service time" is. If it's duration of the verb handler, then of course, that's the point (9bbe38c)
@@ -953,8 +964,13 @@ namespace rpc {
_error = true;
} else if (it != _outstanding.end()) {
auto handler = std::move(it->second);
+ auto ht = std::get<1>(msg_id_and_data); // ht stands for "handler time"
_outstanding.erase(it);
(*handler)(*this, msg_id, std::move(data.value()));
+ if (ht) {
+ _stats.rtt_samples++;
+ _stats.rtt_total += (rpc_clock_type::now() - handler->start) - std::chrono::microseconds(*ht);
+ }
} else if (msg_id < 0) {
try {
std::rethrow_exception(unmarshal_exception(data.value()));
src/rpc/rpc.cc
Outdated
res += m._c._stats.rtt_total; | ||
} | ||
return res.count(); | ||
}, sm::description("Total RTT time in seconds"), { domain_l }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? It will be an average of this-AZ RPC + other-AZ RPC + other-region RPC. Let's say the average is 10ms. What does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no, it won't be averaged between different DCs. The metrics is per metrics domain, Scylla configures metrics domains per-DC and per-verb-group-index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how: scylladb/scylladb#15785
The domain as this patch defines it includes two strings:
First, the datacenter the server lives in. This is because grouping metrics for connections to different datacenters makes little sense for several reasons. For example -- packet delays will differ for local-DC vs cross-DC traffic and mixing those latencies together is pointless. Another example -- the amount of traffic may also differ for local- vs cross-DC connections e.g. because of different usage of enryption and/or compression.
Second, each verb-idx gets its own domain. That's to be able to analyze e.g. query-related traffic from gossiper one. For that the existing isolation cookie is taken as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share Avi's wow - I didn't even know a concept of "metric domains" existed. As the number of Scylla and Seastar developers grows, we need to try to be more diligent about documenting things that other developers will need to know. When I wrote Scylla's docs/dev/metrics.md, I spent many hours trying to figure out all the details of how metrics were used in Scylla and explain them all (I didn't write any of the metrics code, so I had to dig and talk to Amnon to find the details). But clearly, in the 6 years that have passed (time flies...) things have changed. We should update this document - and document more of Scylla's basic utilities that every developer needs to know. Moreover, we should perhaps think of documenting the metrics feature in Seastar's documentation, not just in Scylla's...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote Scylla's docs/dev/metrics.md, I spent many hours trying to figure out all the details of how metrics were used in Scylla and explain them all (I didn't write any of the metrics code, so I had to dig and talk to Amnon to find the details).
It could be a nice blog post as well
We should update this document - and document more of Scylla's basic utilities that every developer needs to know. Moreover, we should perhaps think of documenting the metrics feature in Seastar's documentation, not just in Scylla's...
Is your proposal to update scylla metrics.md with the ... "details" of metrics that come from seastar?
In general I agree, it's worth describing seastar's metrics in more details than just code (and code comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been nice to have in Scylla metrics.md become a "book" that documents in length and in depth the meaning of every single metric (such documentation is sorely missing, I can rarely understand anything in the metrics because of the opaque one-sentence descriptions of metric), but what I focused in the current metrics.md is in the overall structure of all metrics and how to use them. For example, I described the metric labels "shard" and "instance" that every metric has. If recently we added per-scheduling-group labels, and per-"domain" (whatever that is) labels that applies to many different metrics, it would be nice to explain them. If Seastar brings its own set of RPC metrics, would be nice to explain it.
In general, I wouldn't complain if more documentation gets written :-)
Ths code should stick to response_frame_headroom constant, not introduce magic numbers on its own. Signed-off-by: Pavel Emelyanov <[email protected]>
Note the time it took handler (server-side) callback to run and propagate it via optional argument down to .respond() method. Error handlers call respond() too, pass this argument disengaged in this case. Signed-off-by: Pavel Emelyanov <[email protected]>
Introduce new negotiation parameter that tells if the client wants server to send it back the handler execution duration. This is for compatibility with older clients that expect server reply to be of some fixed (shorter) size. New response frame, if negotiated, includes the 32 bit handler duration value in microseconds right next to the existing bits. If no duration was provided (sending error response, for example), the value is 0xF..F. After parsing, the duration is returned as optional value and 0xF..F maps to disengaged one. For now client makes no use of it, next patch will. Signed-off-by: Pavel Emelyanov <[email protected]>
Now the client has two durations using which it can estimate the rount-trip time of a packet -- the duration between sending request and receiving response and the time it too handler to execute on the server side. Difference is the delay value. Those values from different calls are accumulated in a client counter as well as the total number of delay samples received (errors and old servers don't send it). Metrics export both as counters, so one can rate it over time and monitor average delay. Signed-off-by: Pavel Emelyanov <[email protected]>
By default RPC client uses new frames, that include handler execution time. Tune up one simple test to validate that legacy frames still work too. Signed-off-by: Pavel Emelyanov <[email protected]>
upd:
|
The new metrics is total time it takes an RPC message to travel back and forth between verb caller and server. It implicitly consists of the time request and response messages spend in seastar and kernel queues on both sides and classical network RTT.
Calculated for verbs that require response only. First, client negotiates with the server that response frame includes the "handler duration" value, which is the handler callback execution time. Then client notices duration between sending request and receiving response minus handler execution time received from server. This is called "delay sample". Total number of samples and their duration summary are exported as counters in the metrics domain.
refs: #323
refs: #1753 (for what metrics domain is)